fix(dcutr): handle empty holepunch_candidates #5583
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
A few months ago, we were experiencing from time to time some weird failures with
DCUTr
. After some research to find out the problem, it was a race condition : sometimesidentify
must be a little bit slow and theDCUTr
handler is created before anidentify
event is received. Alone, this is not necessarily a problem. But if this race happens when the holepunch candidates list ofDCUTr
is empty, thenDCUTr
will always fail for this connection. Indeed, when receiving an new relayed established connection,DCUTr
will create anHandler
for this connection which will be responsible to make the hole-punching. However, the candidates that are used are the one known atHandler
instantiation, so any future updates aboutNewExternalAddrCandidate
will not be forwarded to theHandler
s.This PR is the upstream of the fix we did several months ago and we did not encountered any particular problem with it since.
Timetable of the problem
OutboundError(NoAddress)
InboundError(UnexpectedEof)
Notes & open questions
I have put some
TODO
s about the potential merging ofself.attempts += 1
. Wheninbound
,self.attempts
is incremented when starting an handshake, however, whenoutbound
,self.attempts
is incremented at the "new outbound substream" request. Before I don't think it was a problem, but now that we do not necessarily trigger an handshake if there is no hole-punch candidates, I think we might was to incrementself.attempts
only when effectively starting the handshake. What do you think ?It is noted in the log when starting a new handshake that, if the corresponding stream (
inbound_stream
oroutbound_stream
) was not empty, then we replace the handshake. There iswarn
level log statingNew inbound/outbound connect stream while still upgrading previous one. Replacing previous with new
. However, when reading the code of theFuturesSet::try_push
method and then theFuturesMap::try_push
method (which is used inside), the future pushed never replaces any old one when capacity is reached, it just returns an error. So what do you think should be done ? Should be actually replace the old with the new like the log says ? Or should we not replace the old with the new and update the log to say that the new one was dropped ?Change checklist